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Abstract 

External or internal shocks may lead to the collapse of a system consisting of many agents. 
If the shock hits only one agent initially and causes it to fail, this can induce a cascade of 
failures among neighboring agents. Several critical constellations determine whether this 
cascade remains finite or reaches the size of the system, i.e. leads to systemic risk. We 
investigate the critical parameters for such cascades in a simple model, where agents are 
characterized by an individual threshold determining their capacity to handle a load aOi 
with I — a being their safety margin. If agents fail, they redistribute their load equally to 
K neighboring agents in a regular network. For three different threshold distributions P(0), 
we derive analytical results for the size of the cascade, X(t), which is regarded as a measure 
of systemic risk, and the time when it stops. We focus on two different regimes, (i) EEE, 
an external extreme event where the size of the shock is of the order of the total capacity 
of the network, and (ii) RIE, a random internal event where the size of the shock is of the 
order of the capacity of an agent. We find that even for large extreme events that exceed the 
capacity of the network finite cascades are still possible, if a power-law threshold distribution 
is assumed. On the other hand, even small random fluctuations may lead to full cascades 
if critical conditions are met. Most importantly, we demonstrate that the size of the "big" 
shock is not the problem, as the systemic risk only varies slightly for changes of 10 to 50 
percent of the external shock. Systemic risk depends much more on ingredients such as the 
network topology, the safety margin and the threshold distribution, which gives hints on how 
to reduce systemic risk. 



1 Introduction 



Current research on systemic risk can be roughly divided into two different strands each one 
having its own focus: (i) the probability of extreme events which can cause a breakdown of the 
system, (ii) the mechanisms which can amplify the failure of a few system elements, to cause a 
failure cascade of the size of the system. The former line of research assumes that systemic risk is 
caused by external events, e.g. big earthquakes, tsunamis, or meteor impacts. Thus, in addition to 
the likelihood of extreme events, another interesting question regards the response of the system 
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to such perturbations, i.e. its capability to absorb shocks of a given size. The latter research area, 
on the other hand, sees systemic failure as an endogenous feature that basically emerges from 
the non-linear interaction of the constituents, i.e. how they redistribute, and possibly amplify, 
load internally. 

In both approaches, the likelihood of a systemic breakdown can only be determined by considering 
the internal dynamics of system elements, denoted as agents in this paper, such as their capacity 
to resist shocks, their time-bound interaction with neighbors, their dependence on macroscopic 
feedback mechanisms, such as coupling to the macroscopic state of the system. Only in rare 
cases the dynamics of systemic risk can be reduced to mere topological aspects, such as the 
diversity in the number of neighbors, the role of hubs, etc.. In this paper, we combine the two 
research questions mentioned above: on the one hand, we are interested in the critical size of 
an external shock that may lead to collapse of the system. At the same time, we address that 
such critical values depend on the safety margins of the system elements, and the details of 
their interaction when redistributing load internally. We also investigate how these cascading 
dynamics are affected by the structural features of the network (level of connectivity, topological 
heterogeneities) and by individual properties of the agents, such as the probability distributions 
of the failure thresholds. Such insights can directly benefit a robust system design by means of 
individualization of agents (i.e. designing agents with optimal heterogeneity). 

Given the importance of such problems for social, economical and technological systems, the 
topic is already discussed in a wide range of scientific literature. Some modeling framework were 
recently proposed [U [2j |3] • The complex network approach was also used to describe cascading 
processes in power grids and in Internet services [U |5], and was also applied to data storage 
services [6]. Importantly, similar agent-based approaches were developed to model avalanche 
defaults among financial institutions 015]. 

Our paper is organized as follows: in the next section we introduce the agent-based model stud- 
ied, determining e.g. agent's fragility and agents' interaction by means of a load redistribution 
mechanism. This allows us to define a measure for systemic risk on the macroscopic level. In 
Section [3] we develop an analytical framework that allows us to unveil the dynamics of systemic 
risk based on cascading processes. In Section 3.2, we discuss the critical conditions for systemic 
risk to emerge. Later, in the Section 3.3, we study when agents can be considered as systemic by 
their importance. The paper finishes with some conclusions in Section [4] which also allow for a 
generalized picture of how to prevent systemic risk. 
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2 Analytical approach to systemic risk 
2.1 Description of the model 

Net fragility In a recent paper [JJ , a framework to model systemic risk by means of an agent- 
based approach was developed. In this framework, each agent r is characterized by three individ- 
ual variables: a discrete variable s r (t) £ {0, 1}, which describes its state at a discrete time t, i.e. 
s r (t) = for an operating state and s r (t) = 1 for a failed state, and two continuous variables, 
the threshold 9 r and load 4> r . The threshold is assumed to describe the individual 1 capacity' of 
an agent: it defines how much load an agent can carry before it fails. On the other hand, the 
variable (j) r describes the load which is exerted on an agent. 

We note that while the load can change in time e.g. through systemic feedback, it further depends 
on the state of other agents s and on the network of interactions, described by the adjacency 
matrix A. Written this way, the load also depends on how it is exchanged between agents. A 
special case will be discussed below. We define that agent r fails if its net fragility z r (t), 

Z r (t)=Mt,As)-9r, (1) 

is equal or larger than zero. I.e, in a deterministic model, the dynamics of an agent is given by 

S r (t + l) = e(z r (t)), (2) 

where 0(-) is the Heaviside function. Certainly, the dynamics only depends on the net fragility, 
i.e. on the relative distance between load and threshold. Nevertheless, a distinction between these 
two individual variables is very useful, as it allows us to conceptually distinguish between internal 
and external influences on the failure. 



Systemic risk We now define the important measure of systemic risk. We define it as the 
fraction of failed agents at any point in time. For a system composed of N agents, it reads 

1 f°° 
X ( t ) = J^T, s ^ = / Pz(t)[z)dz; (3) 

Pz(t) { z ) represents the density of agents with a net fragility z at time t; the integral runs over 
the agents whose net fragility is positive. Failures in a subset of agents will result in cascading 
processes over the network of interaction, which results in changes of the fragility of other agents 
in the course of time. This can be expressed by the recursive dynamics p z (t+\) = 3~{p z (t))> where 
T is some function that describes how the load of failing agents is redistributed depending on 
the interaction mechanisms. With this, by specifying the initial condition p z (p\, it is possible to 
compute X(t) for a deterministic dynamics. 
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In Ref. [I], X(t) was calculated by making suitable assumptions about the distribution of the net 
fragility, p z (t), the initial conditions p z (Q), and for the particular case of a fully connected network 
- i.e. each agent interacts with everyone else. Specifically, an initial condition p z ^ ~ J\f(9,a), 
was used; i.e. the initial fragility of agents is normally distributed with a mean 9 and standard 
deviation a. This implies that the initial fraction of failed agents at time t = is given by 
X(0) = &g a (0), where $g CT (0) denotes the cumulative function of the normal distribution. I.e. 
it gives the (normalized) number of agents with an initial net fragility (defined in Eq. ([I])), equal 
or larger than zero. The authors calculated the size of cascades measured by the final fraction 
of failed agents for different interaction mechanisms. Remarkably, it was found that systemic 
risk depends on the variance a of the distribution p z (0) in a non-monotonous manner. This 
means, systemic risk can decrease if the agents become more heterogeneous, i.e. if their individual 
threshold becomes more different. On the other hand, for homogeneous agents characterized 
by the same threshold, a first-order phase transition was found between no systemic risk and 
complete failure. 

Initial fragility We use these previous findings as a reference point, but we will extend our 
model in different ways. First of all, instead of a normal distribution for the initial net fragility 
z r (0) = (j) r (0) — 9 r , we assume a fixed relation between initial fragility (j) r (0) and threshold 9 r : 

M0) = a9 r (r = l,...,N; r^i). (4) 

The parameter a is a constant, equal for all agents. Only for one agent i, instead of the fixed 
relation (Q, we define </>j(0) = 4>* > 9{. Thus, we consider that initially only one agent i, is at a 
critical condition, whereas with a < 1 all other agents are initially capable of handling the load 
assigned to them. I.e. different from the distribution of initial loads in p], we do not have an 
initial failure cascade. Instead, the initial condition for the systemic risk is simply X(0) = 1/N. 

The value 1 — a can then be regarded as the agent available capacity (or safety margin) before 
they fail if their load is increased. A fixed relation between fragility and threshold was first used 
in [1] to describe cascading processes in power grids and Internet (see also [5j EJ [10] ) . It basically 
reflects the situation of many socio-technical systems in which the capacity of agents is usually 
ad hoc designed to handle the load, because limited by cost, under normal conditions. We will 
later vary the safety margin 1 — a to determine how the severity of cascading failures will depend 
on it. 

Threshold distribution With these considerations, only the threshold distribution P(9) re- 
mains to be specified to complete the initial conditions. It is worth remarking that the capacities 
of the agents -in contrast with other studies found in the Literature so far- are decoupled from the 
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topological artifacts of the network connecting them. Here we will use three different assumptions 
for both analytical calculations and computer simulations: 

(a) a delta distribution P(8) = 5(9 — 9), where 8(-) is the Dirac delta function, i.e. all agents 
have the same threshold 9, 

(b) a uniform distribution P(9) = U(9 — a, 9 + a) with the mean 9 and the range a, i.e. all 
agents have different, but comparable thresholds in the interval [9 — a, 6 + a]. For all further 
calculations we define m \ n — ^ — ® • 



a power-law distribution 



P (0) = ^0~ 7 (5) 



mm 



i.e. agents have thresholds that can differ by orders of magnitudes. As the normalization 
depends on the value # mm , we assign its numerical value for our further calculations to be 
comparable with the minimum value of the uniform distribution, $rnin — 9 — c 



Agent interaction In order to describe the agent interaction, we use the network approach in 
which agents are represented by nodes and interactions by links between agents. I.e., the network 
topology specifies which other agents a particular one interacts with. This can be statistically 
described by the degree distribution P(k) for which we will use in this paper only P(k) = 5k rt K, 
i.e. a regular network, in which each agent interacts with K other agents. The fully connected 
network is a special case with K = N — 1. 

Secondly, we have to specify how agents interact through these links. Here, we assume a load 
redistribution mechanism in which the initially failing agent i shares its load (pi(0) = equally 
among its K neighboring agents (labeled j), see Fig. [T] That means for each of these agents, their 
own load 4>j(t) increases in the next time step (t + 1) by an amount of <f>+/ K. If this addition 
leads to a positive net fragility Zj(t) = 4>j(t) — 9j, agents j fail as well and redistribute their load 
4>j(t + 1) equally to their K neighboring agents, and so on. 



Cascade sizes This way, failure cascades can occur in the course of time, and we are interested 
in their relative size and the probability distribution of their occurrence, P(X(t)). For this 



calculation, which will be done in Sect. 3.2 we define the fraction f(t) of failing agents at each 
time step t and the number F(t) of failing agents during the same time interval as: 



N 



fit) 



Kit) 



Y^s r (t)-s r (t-l); F(t) = K(t)f(t); /(0) = 1. 



(6) 
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t 
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Figure 1: Regular lattice with K = 4, where agent i is hit initially (t = 0) by an external 
shock of size eft*. If i fails, it distributes this load to its K nearest neighbors in the next time 
step (t = 1). If they fail, they distribute their load to their K nearest neighbors in the next 
time step (t = 2) , which are the 2K second nearest neighbors of i, etc.. 

K(t) gives the number of agents that are hit by the cascade at time t, i.e. they are located in the 
t-th neighborhood of agent i which failed initially. Hence, with the model of Fig. [I] in mind, K(t) 
is the number of agents that can potentially fail during time step t. Dependent on the topology 
of the regular network, there are two limiting cases to express how K grows in time. K(t) oc Kt 
holds in regular networks, where the interface grows linearly with distance. On the other hand, 
for Bethe lattices, tree-like structures and random topologies in which loops are neglected [2], 
the number of nodes at distance t is K(t) = K l . 

Using the definition (|6]), we can calculate the size of the cascade at time t, which is equal to the 
systemic risk X(t) as: 



In general, Eq. ([7| cannot be solved analytically. However, in the following sections we will derive 
analytical expressions for f(t), assuming different distributions of agents' thresholds. 




(7) 



T=0 



T=l 
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Finite versus infinite cascades According to the definitions above, a total failure occurs if 
X(t) = 1. In a finite system, this will happen at a finite time if, while in an infinite system this 
final state is reached only asymptotically, t' — > oo. However, in a finite system a cascade can stop 
even for X(t) < 1 if the potential number of failing agents reaches the system size at a given 
time t°, 

t° 

£tf(t)>JV. (8) 
t=o 

Yet, there is a third case to be considered, namely that the cascade stops at a finite time t*, even 
if X(t) < 1 and t < t°, simply because the redistribution of loads to the nearest neighbors does 
not cause further failure. This is expressed by the condition /(£*) = 0. 

We will refer to an "infinite" cascade if X(t') = 1, which means every agent in the system 
has failed. On the other hand, a "finite" cascade occurs either if it stops at time t* < t°, or if 
the redistribution of load has reached the system size, Eq. Q, without causing all agents to 
fail. Consequently, finite cascades stop at time t' = min(i°,i*). We note that, according to our 
definition, Eq. (J7J, systemic risk refers to finite cascades as well, not just to X — > 1. Precisely, we 
are interested in the distribution P(X(t')), i.e. the density of failed agents at the time by which 
cascades end regardless of the cause for that. 



Network capacity In order to put the size of the initial shock into perspective, we refer to 
the total capacity of the network to absorb shocks, which depends on the safety margin (1 — a), 
the total number of nodes, and the threshold distribution P{9). Thus, the capacity Q that the 
system could a priori absorb during the cascade is simply given by 



Q = N(l - a) / dddP{6). 



(9) 



If the threshold distribution has a defined mean value, 9, this expression reduces to 
Q = N{1 — a) 9. On the other hand, for a normalized power-law distribution with a minimum 
threshold value # m i n , the mean value is only defined for 7 > 2. For 7 < 2, a simple argument |llj 
shows that for a finite system the expected value can still be computed. The result is 



Q 



N(l-a) 



7-1 + iV 2 ' 7 



"2" 



if 7 < 2 
if 7 > 2 



(10) 



It is worth noticing that for the delta threshold distribution, the uniform (with 9 ~ a) and the 
power-law distribution with 7 > 2, the network capacity Q is of the same order of magnitude. 
In Fig. [2] we show the network capacity Q for the power-law distribution as a function of 7 
and a. Precisely, it gets the same numerical value in all three cases cases, Q = Q u , if 7 = 1.5 
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and 9 = 2# m i n , as used for the numerical calculations. However, for the power law distribution 
with 7 < 2, the network capacity becomes much larger than in the three other cases because of 
the additional dependence of the number of agents, N 2 ~"> . Choosing 7 = 1.5 for the numerical 
calculations later implies that, compared to the uniform case, we have Q oc ^/~NQ U . 

In this paper, depending on the magnitude of the initial shock, we distinguish between two 
different regimes: 

(i) EEE - the extreme exogenous event resulting in a very large (f)+ which is of the order of Q, i.e. 
much larger than the capacity of the initially failing agent (or the average capacity 9 of agents): 
(ft* ~ Q ^> 9. In this case, there is no surprise that agents involved in the redistribution of load 
will fail, at least in an early phase. Hence, we are mostly interested in the conditions under which 
cascades may stop before they have reached the size of the system. 

(ii) RIE - the random internal event, which assumes that initially one randomly chosen agent 
i faces a load that is slightly larger than its own capacity 0i, drawn from the distribution 
P(9), i.e. (f)* ~ 9 -C Q- This is likely to happen by a random fluctuation of the load (f>i that 
exceeds the threshold, rather than a big impact on the system. In this case, we are interested in 
the conditions under which cascades occur at all. 
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2.2 Conditions for failing nearest neighbors 

We assume that, at t = 0, a single randomly chosen agent i£ 1 . . . N fails because of an initial 
shock, i.e. 0j(O) = </>*> 9%. According to the redistribution mechanism described above, this 
failure will increase the fragility of the nearest neighbors of i, labeled j G nn(i) (see Fig. [I]). 
Agent j can fail if its net fragility becomes positive, i.e.: 

^(l) = ^(0) + ^>%, (11) 
which together with Eq. Q leads to the critical condition for the failure of agent j, 

e > £ »&><*•> = Kit^y < 12) 

Here 09^(0*) defines the critical threshold for the first-order neighborhood of agent i, or the 
critical threshold at time t = 1, respectively. Agents with a threshold between 8 m i n and 9?^ will 
fail, hence the fraction of failing agents at time t = 1 reads: 

/(l) = / d9P{6). (13) 

$min 

This fraction depends on the threshold distribution P(9), so explicit calculations will be given 
in the next Section. At the moment we just assume that at least one agent j has failed, i.e. there 
will be a cascade to the next neighborhood (cf. Fig. [TJ. 

Let us denote failing agents in the first step by j* G nn{i). Their load (p*(t = 1) > 9j will be 
redistributed to their nearest neighbors labeled k G nn(j*). Following the reasoning used for 



Eq. (11), we obtain for the load of agents k at time t = 2: 



&(2) = fc(l)+ £ M^ofc + i £ (V(0) + |). (14) 

j*£nn(k) j*(inn(k) 

The summation is performed over the whole set of failed agents j* that belong to the neighbor- 
hood of k, their load being <j)j*(0) = a9j*. 

The exact amount of agents j* G nn(k) depend on the topological properties of the network 
considered. For example, in square and hexagonal lattices, some second nearest neighbors of i, 
i.e. agents k, have more than one link to agents j in the nearest neighborhood. E.g. for the 
hexagonal lattice, half of the agents at level k have two links to agents j, whereas the other half 
has only one. In general, for regular lattices, this number will be between one or two. In this 
paper, we will restrict our analysis to the case of a single failing node j* in the neighborhood of 
k which is the case for Bethe lattices or sparse random regular networks |12j . The theory can be 
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extended for other regular geometries in a straight-forward manner. With this considerations in 
mind, Eq. (14) becomes 

fo(2) = a6 k + i (a0 r + ^) , (15) 



From Eq. (15) we obtain the critical condition for the failure of agent k if its net fragility becomes 
positive: 

9k ~ °V ~ K(l - a) ■ (16) 

This expression for the critical threshold at t = 2, i.e. in the second-order neighborhood of i, is 
comprised of two redistribution processes. On the one hand those from agents j* failing at t = 1 
and, on the other, the redistribution of the initial load <f>* from agent i failing at time t = 0. The 
fraction of failed agents at time t = 2 (k € nn(j*)) is then given by 

/(2) = / de {1) p(e (1) ) / P(e {2) )d9 {2) , (17) 

^min ^min 

where ^(2)(^(i)) hrdicates that the critical threshold at t = 2 depends on the load of failing agents 
j* at time t = 1, which does not need to be equal for every j*, but depends on the topology. 

Using the same reasoning for the different time steps of the cascade, we obtain a general expression 
for the fraction of agents failing during time step t (which are the neighbors of agents failing at 
t-1): 

f(t)= / de^p(e {1) )-- - / do^p^-V) / de^p(9^), (is) 



The critical threshold 9 c ((f)^) at time i depends on the load <j>(t—l) as follows: 

This is a recursive equation, i.e. f(t) depends on the load redistributed by all the agents that 
failed along the path connecting the initially failing agent i with agents failing at time t. However, 
Eq. (18) cannot be computed in general, thus, in the following sections, we will study some cases 
in which this equation can be reduced and solved. 



2.3 Threshold approximation in the RIE and EEE regimes 



Inequality (19) is an important result to understand the propagation of cascades, which holds 
for both regimes introduced above, the EEE regime, where the external shock dominates the 
dynamics, and the RIE regime, where small random events inside the first failing agents may 
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trigger the cascade. For both, we are able to derive some general results even before specifying 
the threshold distribution P(6). 

In the RIE case, 0* ~ 9 «C Q the redistribution of loads, that means the network effect, plays 
the most important role. Note that pairs of agents connected through an edge have in general 
different capacities. If one of them fails, its neighbor is exposed to failure during the following 
time step. However, whether it fails or not will depend on: (a) the load redistributed from the 
failing agent; (b) its own capacity. Let us assume that if a agent with capacity 0(t-i) fails, the 
total load induced on its neighbors is (j)^ = aO^ + 9( t _i^/K. This assumption neglects the 
contribution of agents that failed before the time step t—1, which are terms of order K~ T , with 
r > 2. This implies that 0(t-\) is the load distributed by the agent, i.e. it is exactly its capacity 
and not more. Thus, the largest capacity of the failing agent at time t, 9%\, depends on the 
capacity of the agent that failed on the previous time step t—1, i.e. 



- Ki&y < 20 > 



which is a lower bound for the failure condition, Eq. (19). This means that among all neighbors 
of the agents at layer (t — 1), those with a capacity lower than will fail. 



With the assumption (20), the fraction of failing agents at time t can be decomposed in terms 



of failure of two consecutive agents in a pair-wise approximation as follows: 

/(*)=/_ P(fl( t _i)) /. d9 (t) P(9 (t) ). (21) 



This approach differs from the previous mean-field approximation, in the following. Now, the 
net effect of the load redistributed by a failing agent is taken into account to determine the 
fraction of its neighbors that will fail in the next time step. Thus, this approximation entails 
information about the heterogeneity at the edge level. On the other hand, even in the EEE 
regime, 0* ~ Q S> 9, the role of the network still cannot be neglected. We are interested in 
the limit satisfying: (i) the load redistributed by the previously failed nodes cannot be totally 
neglected -i.e. a ~ 1-; (ii) the main contribution to the load 9^ comes from 0* (i.e. from the 
initial load). We assume that at any point in time, there exists a critical threshold 9%, above 
which agents do not fail. In this case Eq. (19), becomes simply 9^ = 9^ t _ 1 ^/[K(l — a)]. Then, 



the load of agents at a distance t from agent i is simply given by 

± 

(t-l) 



K(l-a) [K(l-a 
which results in the critical threshold for the EEE regime: 



'<<> S W = KT^F < 23) 
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Figure 3: Evolution of the critical threshold 9%\ (note the log scale) versus the time step t for 
the case of a uniform distribution with parameters 9=1 (dotted line), and a = 0.5 (dashed 
lines). The value of K is set to four, while we used different values for the parameter a. 



This gives the critical condition for the failing threshold of an agent that is hit by the cascade 
at time t (i.e. it belongs to the t-th nearest neighborhood of the initially failing agent i). It 
nicely separates two effects that determine the severity of a cascade: (a) the size of the initial 
shock 0*, (b) the number of neighbors to share the load and their respective safety margin, i.e. 
K (1 — a). In the limit of large external shocks, and independent of further assumptions about the 
threshold distribution, the sequence of the critical thresholds 0?^ crucially depends on the sign 
of the factor K(l — a). If K(l — a) > 1, the sequence 0^ will approach zero exponentially, i.e. 
with increasing distance from the initially failing agent, this condition will be more easily met. 
Hence, there should be a finite t? at which all reasonably chosen threshold values r are larger 
than the critical threshold, which implies that the cascade stops. This is shown in Fig. [3] for the 
case of the uniform threshold distribution for different values of the safety margin (1 — a). We 
can verify for the given set of parameters that for a = 0.2, and a = 0.5 the cascade stops right 
after t = 1, while for a = 0.7 it stops after t = 3. On the other hand, for a = 0.8 we see that 
the critical threshold is already at t = 3 larger than any existing threshold, so the full cascade 
cannot be prevented. 

While this is an intuitive and illustrative example, we will calculate analytically the exact time 
t' at which the cascade may stop, in the following section. We note again that due to the finite 
system size cascades may stop already at time t°, which gives an additional limit. 
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3 Critical conditions for systemic risk 
3.1 Analytical estimations of cascade sizes 

Up to this point, we have derived a measure for systemic risk X(t), Eq. ([7| that is based on the 
fraction f(t) of agents failing at a given time t. Failure cascades can propagate in the system if 
the net fragility of agents 4> r (t) — 9 r is positive, i.e. if the load exceeds the capacity. While 4> r (t) 
becomes a function of the redistribution of loads in previous time steps, the capacity is determined 
by a threshold distribution function P(9), for which we use three different specifications. Already 
the general framework outlined above allows us to expect "infinite" [X — > 1) and "finite" {X < 1) 
cascades, where the latter can encompass the whole system or stop before. In the following, we 
will specify the conditions for these findings for the different threshold distributions. 



Cascade size for homogeneous threshold Let us start with the simplest case that all agents 
have the same threshold and the same number of neighbors. As stated above, there is a failing 
agent % at t = 0, for which 4>i(0) = 4>*. Because of the homogeneous distribution, it can be noted 
that if an agent r fails due to the failure of one of its neighbors, r*, then all the neighbors of r* 
will fail as well. I.e. f(t) = 1 if f{t — 1) = 1. Let (f)^ be the load of agents at a distance t of the 
initially failing agent at t = and let us assume that agents at a distance lower than t already 
failed. Then, the load of agents in shell t is 

*(«) = °0+%^ ( 24 ) 
With the initial condition (fir \ = this recursive equation can be easily solved, yielding 



^) =aS h^ + w (25) 

I.e., agents exposed to the redistribution of load at time t will fail if (f)M > 9. This equation allows 
to gain insight into the cascade mechanism. On the one hand, according to the above discussion 
of the EEE regime, infinite cascades can only be triggered if K(l — a) < 1, irrespective of 
the threshold distribution. This means a topological effect (the number and safety margin of 
neighbors among which the load is redistributed) decides about finite and infinite cascades. 

On the other hand, when K{\ — a) > 1, the initial load <fi+ cannot (by itself) trigger an infinite 
cascade in the case of homogeneous threshold. I.e. even in the EEE regime, a cascade will only 
last t* time steps, where t* results from the failing condition < (f)( t *y From the condition 
f(t*) = 0, we can compute 

log {(1 - K- 1 )^ - a~9} - log {(1 - K-^9 - a§} 

f = i — t? • 26 

log K 
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As discussed before, the actual time t' at which the cascade stops is t' = min(i°, t*), where t° 
denotes the time where the cascade reaches the system size, Eq. 

Knowing if, we can further calculate the systemic risk according to Eq. 0, with f(t) = 1, i.e. 
F(t) = Kit) for t < t! and Fit) = 0, otherwise. We find 

K 1 '- 1 - 1 

™ = W^Y) (27) 

for a Bethe lattice or a tree with coordination number K and 

Xit^^pl (28) 
for a regular lattice, with the exact factor depending on the topology. 

Cascade size for uniform threshold distribution We now turn to the simplest case that 
allows some heterogeneity in the agent's threshold, which is the uniform distribution P(0) = 



Ui9 — a, 9 + a). The failing condition in Eq. (12) for the nearest neighbors still holds, but the 



question is how often we find thresholds below the critical limit: 



/( 1 ) - ■ /_ P(9)d9 = . (29) 



With 0?jn given by Eq. (12) it turns out that the fraction of failing agents at the first time step 



is 



m = MK(l-a)]-(d-a) . f MS _ ff) > K{1 _ a) (30) 
la 



and /(l) = otherwise. Regarding fit), we know from Eq. (18) that the fraction of failing 
agents at any time step crucially depends on the history of failed agents, i.e. the path connecting 
the initially failed agent with the currently failing one. Therefore, in general, the process is not 



solvable. There is the need of a simplifying assumption to break the integral expression in Eq. ( 18 ) 
into solvable parts. 



For the EEE regime we use Eq. (23) and the underlying assumptions to obtain the closed 
equation, 

^t*-i)) 1 fa e-a 



With this expression, we find from f(t*) = the time at which the cascade stops for the uniform 
threshold distribution: 

_ log(<M - log(fl - a) , s 

Again, the time at which the cascade ends is given by t' = min(t°, t*), with t° given by Eq. pi). 
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Considering instead the RIE regime, eft* ~ 8, where redistribution effects play a mayor role, we 
use Eq. (21) and the underlying assumptions. Neglecting capacity-capacity correlations among 



agents, we find for the case of the uniform threshold distribution: 



/(*) 



\t-i) 



7 (;- 



+ [l-2K(l-a)}(6-a) 



8a 2 K(l - a) 



(33) 



The time at which the cascade stops is, as in the previous cases, given by the condition t' = 
min(t°,t*), where t* is the first time step that verifies /(£*) = 0, and t° is the one defined in 
Eq. dSb. 



Cascade size for power-law threshold distribution Now, we discuss the case where agent's 
threshold follows a power-law distribution, Eq. ([5]), and can vary by orders of magnitude. With 
the same procedure as used before, we determine the fraction of failed agents during the initial 
cascade as 



/(I) 



(i) 



P(8)d9 = 1 



K(l-a) 



1-7 



(34) 



So, cascades are obtained if </>*/0 mm > K{\ — a). 



Considering first the EEE regime, we use the approximation given by Eq. (22) and find for the 
fraction of failing agents during time step t: 



/(*) 



1 



1-7 



[K(l - a 



1(7-1)* 



From f(t*) = 0, we calculate the time when the cascade stops as 

.* _ log(^) ~ log(gmin) 

log[iT(l - a)] ' 

Again, following our previous discussion, the cascade stops at t' = min(i*,t°). 



(35) 



(36) 



In the RIE regime, on the other hand, Eq. , (21 ) has to be applied to the power law distribution 
to yield 

\K{l-a^-' 



fit) = 1 



7 (*-i). 



27-2' 



1 



'(*-!), 



(37) 



which, together with the failure condition of Eq. (20 ) provides a set of recursive equations, to be 
solved numerically. 
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Figure 4: Systemic risk in a system with uniform threshold distribution with 9 = 1, and 
a = 0.5. The contour plots show X dependent on the initial shock normalized to the net- 
work capacity Q for different values of a. Left: Bethe lattice, right: regular network. K = 4, so 
the dashed line K{\ — a) < 1 separates the region of "infinite" cascades, X(t') = 1, from the 
region of possibly finite cascades, X(t') < 1. 

3.2 Numerical results for the EEE regime 

Size of finite cascades The analytical results for f(t*) and t 1 = min(i*,t°) allow us now 
to calculate the systemic risk X(t') for the different threshold distributions, by varying system 
parameters such as the safety margin (1 — a) or the network topology. In this section, we first 
concentrate on the EEE regime, where the external shock is comparable to the network capacity 
and much larger than the average threshold of an agent, (f>* ~ Q > 0. 

In Fig. [4] we compare, for the uniform threshold distribution, the systemic risk X in Bethe and 
2D regular lattices. We remind that the difference is in the number of agents potentially affected 
by the cascade at a given time t. For Bethe lattices and regular trees, we have Kit) = K*, 
whereas for regular networks K(t) oc Kt, i.e. for a given t in Bethe lattices much more agents are 
affected. Conversely, for a given N, regular lattices have a larger diameter. For example for the 
2D regular lattice the diameter grows with system size as ^/N. On the other hand, the diameter 
in a Bethe lattice grows as log N. 

In both cases, trivially, if the safety margin (1 — a) vanishes, any external shock results in an 
immediate collapse. This also happens for finite safety margins as long as the global stability 
condition K{1 — a) > 1 is not met. On the other hand, for large safety margins, a — > 0, we 
do expect finite cascades. As the plots show, the parameter region for these is much larger for 
regular networks where at each time step a smaller number of agents is affected, than for Bethe 
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Figure 5: Systemic risk in a system with power law threshold distribution: (left) 7 
(right) 7 = 3.0, for a Bethe lattice with K = 4, N = 1000. Cf. also Fig. II 



1.5, 



lattices. As shown, in the first case an initial shock of almost 30% the network capacity already 
leads to full cascade, whereas in the second cease it requires an initial shock of almost 60% the 
system's available capacity for a full cascade. 

The results are to be compared with Fig.[5j where we plot the systemic risk X, for Bethe lattices 
only, for a power law threshold distributions with two different exponents 7. We remind that 
the network capacity Q is comparable to the case of the uniform threshold distribution only for 
7 = 3, Fig. [5] (right) and we also observe a similar dependence of X on the safety margin (1 — a) 
and on the relative initial load <f)*/Q. To be precise, in this case the safety margin plays a less 
important role, but we find finite cascades also for 4>*/Q > 0.5, which was not the case for the 
uniform threshold distribution. 

The situation differs for 7 = 1.5, as Fig. [5] (left) shows. Here the system seems to be much 
more vulnerable, indicated by large values of X in all parameter regions. To put this finding into 
perspective, we remind that for 7 < 2 the network capacity is much larger than for 7 > 2, i.e. 
the initial shock also has much higher values as compared to Fig. [5] (right). This explains the 
severity of the cascades in this case. 

The results obtained for these two particular values of 7 can be generalized to 7 < 2 and 7 > 2. 
As a consequence, we may conclude that with a much broader threshold distribution, the system 
can absorb higher initial shocks (in absolute values), but shocks of a size comparable to the 
network capacity most likely result in infinite cascades, i.e. total failure. 

To better understand the role of the skewness of the threshold distribution and the topology, 
we fix the relative initial shock 4>±/Q = 0.2 and vary 7 for two different network topologies. 
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Figure 6: Systemic risk in a system with power law threshold distribution. The relative initial 
shock was fixed: (f)±/Q = 0.2. N = 1000, K = 4. (Left) Bethe lattice, (right) 2D regular lattice. 
Cf. also Fig. H 

Fig. [6] confirms (a) that a Bethe lattice or regular tree structure leads to more severe cascades as 
compared to a regular network, which is due to the smaller diameter of the network, and (b) that 
an increasing skewness, i.e. smaller values of 7 lead to an increasing systemic risk. Remarkably, 
there is a non-monotonic dependence of I on a and 7, and the cascade size becomes larger 
around 7 ~ 1.5. The reason for this is, on the one hand, the system size dependence of the 
network capacity for 7 < 2 and, on the other hand, the larger fragility resulting from a more 
skewed distribution (i.e. thresholds are closer to # m i n )- The first effect is a global one, i.e. larger 
load is added to the system, the second is a local one, i.e. there are fewer agents that can handle 
large loads. 

3.3 Results for the RIE regime 

The previous results have focused on the EEE regime of large external shocks, ^ ~ Q > 9. This 
means that the initial load is largely responsible for triggering the cascades. Here we focus on the 
opposite case, the RIE regime, Q 3> <fi+ ~ 9, where small fluctuations of an agent's load lead to 
the agent's failure provided that the safety margin of that agent was rather small. The question 
is then under which conditions this failure leads to a cascade of macroscopic size. Applications 
of this case include power-grid cascade failures [U [5J [10], or failures of server infrastructure [6]. 

We now consider 0* = B% for the failing agent, and we assume that the system is in a critical 
condition. This means that a single failure among the neighbors of the initially failing agent 
(i.e. /(I) > V-K") is enough to trigger a system- wide cascade. We now define (f>1 as the load such 
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that /(l) is exactly equal to 1/K. With these assumptions, we compute the ensemble average 
(X) of the systemic risk 

(X) = [ MiPipi). (38) 
J 4>1 

The integral runs over the threshold 9\ > ft of all possible agents i which trigger a full cascade. 
These agents can be regarded as systemically important because their failure induces a systemic 
collapse. 

The quantity {X) then represents the frequency at which a randomly chosen agent can trigger the 
complete failure of the system. In the following we compute (X) for the uniform and power-law 
thresholds distribution. 



Uniform threshold distribution The initial critical load ft in the RIE regime can be easily 



computed from Eq. (30), using the condition /(l) = 1/K: 

ft = K(l - a) [2a/ K + (9- a)] . 
Then, the average systemic risk for the uniform distribution (X u ) is given by 

if 9 - a > ft 
{X u } = { [(2a - 1) - K(l - a)] /2 - 9 [K(l - a) - 1] /2a if 9 - a < ft <9 + a ■ 

1 if 69 > 9 + a 



(39) 



(40) 



Power-law threshold distribution In an analogous way we obtain from Eq. (34) with 
/(l) = 1/K the expression for the initial critical load 
distribution: 

ft = 9 min K(l-a)(l-K- 1 )- lh ~ 1 
This allows us to calculate the average systemic risk as: 



1 in the case of a power-law threshold 

(41) 



l-K- 1 
[K(l-a)] 

1 



7=1 



if 



> 



otherwise 



(42) 



In order to study the role of connectivity in the cascade process, we created random regular 
networks |12j with arbitrary values of the average connectivity K/N. Fig. [7] shows the average 
cascade size (X) for a rather small safety margin, which is in line with the RIE regime. This 
means that small fluctuations of the load of a single agent may lead to its failure. Fig. [7] allows to 



compare the analytical expressions in Eqs. (40), (42) with numerical simulations of the cascade 
process. The graphs show a sharp transitions from (X) = 1 to (X) < 1 at K/N = 1/2. This 
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Figure 7: Frequency of "infinite" cascades (X) for the uniform (left) and the power law 
(right) threshold distribution in the RIE regime. We consider regular random networks of 
size N = 1000 with varying K; the safety margin of agents is set to (1 — a) = 0.002. Open 
symbols show simulation results, dashed lines analytic results. Parameters for the uniform 
distribution: 6 = 1, a = (triangles), a = 0.3 (diamonds), a = 0.6 (squares), and a = 0.9 
(circles). Parameters for the power law distribution: 7 = 1.1 (circles), 7 = 2 (squares), 7 = 3 
(diamonds), and 7 = 4 (triangles). 



results directly from the change in global instability K(l — a) < 1, at that particular point. On 
the other hand, when the system is not globally unstable, the results show that only a subset of 
agents are able to trigger a cascade in the system, their fraction indicated by (X) 

The left panel of Fig. [7] shows the results for the uniform threshold distribution. For identical 
agents (left panel, a = 0), a sharp transition between complete failure {(X) = 1) and no failure 
((X) = 0) can be observed. This result immediately follows from Eq. (25), in the limiting case 
t — > 00. For a fixed value K/N > 1/2, the graphs show that larger values of a exhibit larger 



frequencies of full cascades, (X). This results from Eq. (39) which shows that the critical initial 
load (fa decreases for increasing heterogeneity; thus, for larger values of a a larger fraction of 
agents is likely above such a threshold. 

The right panel of Fig. [7] shows the results for the power-law threshold distribution. Again, 
broader distributions, i.e. lower values of 7, result in a higher probability of complete failures. 



This is in line with Eq. (41) where, for a fixed K, the critical load increases with 7. At the 
same time, the threshold distributions become more narrow with larger 7. Thus, the amount of 
systemically important agents that are able to trigger a full cascade, is much lower in distributions 
with large 7, and thus the average systemic risk decreases. 
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4 Conclusions 

The model proposed in this paper is based on very simple ingredients, to allow for analytical 
treatment: (i) a regular network, i.e. all agents have the same number of neighbors, K, (ii) a 
constant safety margin (1 — a), the same for all agents, which defines a fixed relation between 
the load (j) r an agent can possibly carry and the threshold 9 r at which it fails, (iii) an initial 
condition that only one randomly chosen agent i fails when facing a load 0*, (iv) a redistribution 
of the load of the failing agent to its K neighbors. 

The variability of the model comes from two assumptions: (v) the threshold distribution P{9) 
which was chosen as a delta distribution, a uniform distribution, or a power law distribution, 
(vi) the severity of the initial shock, which was either of the order of the network capacity Q, 
i.e. much larger than the average threshold, or much smaller than the network capacity, i.e. of 
the order of the average threshold. The latter allowed us to distinguish between two important 
regimes: (a) EEE regime, where an extreme external event was large enough to cause the failure 
of many agents, (b) RIE regime, where small random internal events may result in the failure 
of a single agent. In both cases, this initial failure may have triggered a cascade of failures in the 
neighboring agents. 

For both regimes, we are interested in the following question: what is the possible size of a failure 
cascade, X(t), measured as the total number of failed agents compared to the system size N, at 
a given time t. Will this be an "infinite" {X — > 1) or a "finite" {X < 1) cascade and, in the latter 
case, at what time will it stop: at a time t° where the cascade has already reached all agents but 
did not cause all of them to fail, or at a time t* before it has passed the entire system. We regard 
X as a measure of systemic risk. 

For both regimes, we are able to derive analytical results to answer these questions. These 
results allow us to draw conclusions about the conditions that lead a smaller systemic risk. In 
the following we summarize our finding: 

(1) We derive a global stability condition K(l — a) > 1 that has to be met in order to allow 
for finite cascades, in principle. The larger the number of neighboring agents, K, or the larger 
the safety margin, (1 — a), the more likely this condition is met. This allows for an interesting 
discussion because of the possible trade-off between the two ingredients. In most cases, the safety 
margin is given by the technical constitution of an agent, e.g. in power grids or routing servers. 
K, on the other hand, refers to the network topology but not to internal properties of agents. 
Hence, systemic risk can be reduced by increasing the network density - at least up to a certain 
point [7j. It should be noted that we have assumed the initial failure of only one agent, here. If it 
is, however, more costly to improve the network connectivity than increasing the safety margin 
of the agents, the latter can serve the same purpose, namely reducing systemic risk. 
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(2) In a system of thousands of agents, the network capacity, Q, which is the total load the 
system can carry a priori, is also quite big. One would not easily assume that an initial shock (f>± 
is of the same magnitude as Q as in the EEE regime. Hence such big shocks are extreme external 
events to the system. The interesting finding in this paper is that even such extreme events may 
not lead to an "infinite" cascade, i.e. to a total collapse of the system. Instead, provided that 
the global stability condition is met, we find a broad range of system parameters where such 
cascades stop at finite time, affecting only part of the system. We have shown that systemic risk 
resulting from extreme shocks can be reduced (a) by a regular lattice structure (as opposed to 
e.g. a regular tree structure), (b) by a broad threshold distribution. In the latter case, we found 
finite cascades, i.e. X < 1, even if the initial shock was 2-4 times larger than the total network 
capacity, which can be regarded as a sign of real robustness. Comparing the power-law threshold 
distributions of 7 = 1.5 and 7 = 3, we found that, in absolute measures of the shock, the broader 
distribution lead to the more robust systems. In relative measures, however, this result inverts, 
simply because a broader distribution also results for a larger network capacity, and hence for 
larger initial shocks, while the relative measure remains the same. 

(3) Investigating the RIE regime where the initial shock was of the order of the threshold of an 
average agent, i.e. much smaller than the total network capacity, a systemic failure can occur only 
if (a) the the initial shock is larger or equal to the threshold of the initially failing agent, and (b) 
the redistribution of load is large enough. Hence, dependent on the threshold distribution we can 
calculate this failure probability. Even in the global stability regime, we find "infinite" cascades, 
but the probability of their occurrence depends on the probability that randomly chosen agent 
fails initially. The broader the threshold distribution, the more likely this condition is met, i.e. 
the frequency of observing "infinite" cascades increases with the heterogeneity. 

(4) The initial question: "How big is too big", from this perspective, can be answered as follows: 
Initial shocks, even if they exceed the capacity of the whole system (not just the capacity of a 
single agent), are probably not the problem. Of course, there are parameter regimes that lead 
to complete collapse {X — > 1). At the same time, we see that a change of of 10 or even 
50 percent does not change the systemic risk very much. Of much larger influence are system 
parameters related to the network topology, the safety margin, and the threshold distribution. 
As it was also found in other papers |T], an optimal heterogeneity in the agent's threshold can 
reduce systemic risk considerably. In addition to that we find that an change of the safety margin 
by 10 or 50 percent generates a much larger impact on systemic risk than a comparable change 
in the external shock. So, when seeking for protection against systemic risk the focus should be 
(a) on those parameters that influence the global stability, i.e. K(l — a) (see above), and (b) on 
the optimal heterogeneity in the threshold distribution. 
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